Parallel Corpus Development at NVTC

نویسندگان

  • Jocelyn Phillips
  • Carol Van Ess-Dykema
  • Timothy Allison
  • Laurie Gerber
چکیده

In this paper, we describe the methods used to develop an exchangeable translation memory bank of sentence-aligned Mandarin Chinese English sentences. This effort is part of a larger effort, initiated by the National Virtual Translation Center (NVTC), to foster collaboration and sharing of translation memory banks across the Intelligence Community and the Department of Defense. In this paper, we describe our corpus creation process – a largely automated process – highlighting the human interventions that are still deemed necessary. We conclude with a brief discussion of how this work will affect plans for NVTC’s new translation management workflow and future research to increase the performance of the automated components of the corpus creation process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Strategies Used in the Translation of Interlingual Subtitling

This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...

متن کامل

Design and compilation of a specialized Spanish-German parallel corpus

This paper discusses the design and compilation of the TRIS corpus, a specialized parallel corpus of Spanish and German texts. It will be used for phraseological research aimed at improving statistical machine translation. The corpus is based on the European database of Technical Regulations Information System (TRIS), containing 995 original documents written in German and Spanish and their tra...

متن کامل

Material Development and English for Academic Purposes Word Lists; a Reductionist Approach

Nagy (1988) states that vocabulary is a prerequisite factor in comprehension. Drawing upon a reductionist approach and having in mind the prospects for material development, this study aimed at creating an English for Academic Purposes Word List (EAPWL). The corpus of this study was compiled from a corpus containing 6479 pages of texts, 2,081,678 million tokens (running words) and 63825 types (...

متن کامل

A Web-based English Abstract Writing Tool Using a Tagged E-J Parallel Corpus

In this paper, we present a Web-based English abstract writing tool, the “BEAR (Building English Abstracts by Ricoh).” This English writing tool is aimed at helping Japanese software engineers improve the organization of their writing by enabling them to select a rhetorical template of the target abstract and to build up component sentences while having access to good-quality sample sentences. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010